Relational Learning of Pattern-Match Rules for Information Extraction
نویسندگان
چکیده
Information extraction is a form of shallow text processing which locates a specified set of relevant items in natural language documents. Such systems can be useful, but require domain-specific knowledge and rules, and are time-consuming and difficult to build by hand, making infomation extraction a good testbed for the application of machine learning techniques to natural language processing. This paper presents a system, RAPIER, that takes pairs of documents and filled templates and induces pattern-match rules that directly extract fillers for the slots in the template. The learning algorithm incorporates techniques from several inductive logic programming systems and learns unbounded patterns that include constraints on the words and part-of-speech tags surrounding the filler. Encouraging results are presented on learning to extract information from computer job postings from the newsgroup misc. jobs. offered.
منابع مشابه
Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction
Information extraction is a form of shallow text processing that locates a specified set of relevant items in a natural-language document. Systems for this task require significant domain-specific knowledge and are time-consuming and difficult to build by hand, making them a good application for machine learning. We present an algorithm, RAPIER, that uses pairs of sample documents and filled te...
متن کاملLearning Relational Structure for Temporal Relation Extraction
Recently there has been a lot of interest in using Statistical Relational Learning (SRL) models for Information Extraction (IE). One of the important IE tasks is extraction of temporal relations between events and time expressions (timex). SRL methods that use hand-written rules have been proposed for various IE tasks. In contrast, we propose an approach that employs structure learning in SRL t...
متن کاملLearning Relational Features with Backward Random Walks
The path ranking algorithm (PRA) has been recently proposed to address relational classification and retrieval tasks at large scale. We describe Cor-PRA, an enhanced system that can model a larger space of relational rules, including longer relational rules and a class of first order rules with constants, while maintaining scalability. We describe and test faster algorithms for searching for th...
متن کاملTowards First-Order Random Walk Inference
Path Ranking Algorithm (PRA) addresses classification and retrieval tasks using learned combinations of labeled paths through a graph. Unlike most Statistical Relational Learning (SRL) methods, PRA scales to large data sets but uses a limited set of paths in its models—ones that correspond to short first order rules with no constants. We consider extending PRA in two ways—learning paths that co...
متن کاملInformation Extraction from Patients' Free Form Documentation
The paper presents two rule-based information extraction (IE) from two types of patients’ documentation in Polish. For both document types, values of sets of attributes were assigned using specially designed grammars. 1 Method/General Assumptions Various rule-based, statistical, and machine learning methods have been developed for the purpose of information extraction. Unfortunately, they have ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997